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SYSTEM AND METHOD FOR CONTROLLING ROUTING IN A 

VIRTUAL ROUTER SYSTEM 

Field of the Invention 

The invention relates generally to computerized networks, and more 
specifically to a system and method for controlling routing in a virtual routing 
system. 

Background of the Invention 

Computer networks are becoming increasingly important to the way 
computers are used for business, recreation, and communication. The ability of a 
computer network to easily and efficiently move data from a sender to the 
intended destination is critical to the usefulness of computer networks, and to 
their ability to handle the large amount of varying traffic that is encountered in 
modem network environments. 

Networks are often characterized as local area networks (LANs) or wide 
area networks (WANs). LANs typically comprise anywhere from a few 
computers sharing a common network to large groups of computers located 
physically near each other such as an entire building's network. WANs are 
larger in scope, and include networks that have geographically dispersed 
computers such as the Internet. Networks can be further characterized by the 
types of data that they carry or the protocols they use, such as IPX networks that 
are often found in Novell local area networks, and TCP/IP networks that are 
often found in the Internet and in other LANs and WANs. Also, different 
physical network connections and media such as Ethernet, Token Ring, 
Asynchronous Transfer Mode (ATM), and Frame Relay exist, and can be carried 
over copper, optical fiber, via radio waves, or through other media. 

Networks of different types or that are geographically dispersed can be 
interconnected via technologies such as routers, switches, and bridges. Bridges 
simply translate one network protocol to another and provide a communications 
"bridge" between different types of networks. Switches allow connectivity of a 
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number of switched devices on a network to a single network connection, and in 
effect filter and forward packets between the network connection and the various 
attached devices. Routers typically do little filtering of data, but receive data 
from one network and determine how to direct the data to the intended 
destination networked device. Routers typically use headers of a packet such as 
an IP packet header for Internet communication to determine the intended 
destination for a packet, and communicate with other router using protocols such 
as the Internet Control Messaging Protocol (ICMP) to determine a desired route 
for a packet to travel from one network device to another. Routers therefore are 
primarily responsible for receiving network traffic and routing it across multiple 
LANs or across a WAN to the intended destination. 

Data packet routing is a critical element of network performance, and can 
become a problem if large local area networks send a lot of network traffic 
through a single router connection to other networks. Factors such as 
transforming data of one type or in one protocol to another protocol or format 
can require significant processing, and serve to further tax the ability of routers 
to connect various types of networks. Some routers incorporate multiple 
processors to handle different data protocols and formats, and are configured by 
the manufacturer by specially configuring the hardware or by hard-coding 
elements of software to meet specific requirements of a specific customer 
application. Unfortunately, using such a router in a changed environment is 
often less than optimal, and reconfiguration of the router would require re-coding 
the control software or replacement of hardware elements. Further, performance 
of the various functions performed on each packet in a stream of packets is often 
not optimal, both because certain parts of the packet forwarding process are 
repeated and because the various resources available may not be allocated in a 
manner efficient for some situations. 

It is therefore generally desirable to have a system or method for 
controlling routing of network data that provides efficient configuration of 
routing functionality and that optimizes use of available resources. 
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Summary of the Invention 

A system for applying one or more functions to network data packets in a 
virtual router is provided. A packet comprising part of a packet flow is received, 
and the packet is evaluated to determine which of the one or more functions are 
5 to be applied to the flow. The results of the evaluation are stored in a record, and 

the functions indicated in the stored record are applied to subsequent packets in 
the packet flow. 

Brief Description of the Figures 
I o Figure 1 shows a block diagram of the Internet Protocol Service 

Generator router architecture, consistent with an embodiment of the present 
invention. 

Figure 2 shows a block diagram illustrating packet flow in the Internet 
Protocol Service Generator, consistent with an embodiment of the present 

15 invention. 

Figure 3 shows a block diagram illustrating operation of the Internet 
Protocol Network Operating System in the context of the Internet Protocol 
Service Generator, consistent with an embodiment of the present invention. 
Figure 4 illustrates the hardware architecture of a packet forwarding 
20 engine, consistent with an embodiment of the present invention 

Figure 5 illustrates the forwarding data structures stored in system 
memory and the packet forwarding ingress and egress processing method of one 
embodiment of he present invention. 



25 Detailed Description 

In the following detailed description of sample embodiments of the 
invention, reference is made to the accompanying drawings which form a part 
hereof and in which is shown by way of illustration specific sample 
embodiments in which the invention may be practiced. These embodiments are 
30 described in sufficient detail to enable those skilled in the art to practice the 

invention, and it is to be understood that other embodiments may be utilized and 
that logical, mechanical, electrical, and other changes may be made without 
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departing from the spirit or scope of the present invention. The following 
detailed description is, therefore, not to be taken in a limiting sense, and the 
scope of the invention is defined only by the appended claims. 

The present invention comprises in one embodiment a system for 
applying one or more functions to network data packets in a virtual router. A 
packet comprising part of a packet flow is received, and the packet is evaluated 
to determine which of the one or more functions are to be applied to the flow. 
The results of the evaluation are stored in a record, and the functions indicated in 
the stored record are applied to subsequent packets in the packet flow. 



Application of these functions occurs in one embodiment in the context 
of a virtual router operating on a user-configurable and scalable virtual router 
system. Examples of such a system are described in detail herein to provide 
context for understanding operation of the invention, but are only examples of 

1 5 one of many possible implementations of the present invention. 

Figure 1 shows a block diagram of the Internet Protocol Service 
Generator (IPSG) router architecture, consistent with an exemplary embodiment 
of the present invention. The IPSG architecture is an architecture that manages 
switching, routing, and computing resources within a user-configurable hardware 

20 router architecture. The architecture provides user-level service customization 

and configuration, and provides scalability for future expansion and 
reconfiguration. The IPSG, shown generally in Figure 1 , comprises one or more 
virtual routing engines 101 that provide routing capability in the virtual services 
environment of the IPSG architecture. One or more virtual service engines 102 

25 provide packet processing capability in a virtual services environment. The 

advanced security engine 103 provides processing capability specifically directed 
to security functionality for security protocols such as IPSec. The functions 
provided may include, but are not limited to, 3DES/RC4 SHA, MD5, PKI, RSA, 
Diffie-Hellman, or other encryption, decryption, or verification functions. 

30 Midplane interface 104 provides connectivity between the IPSG and other 

system hardware. 
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These elements are tied together by service generator fabric 105, which 
manages and controls the other elements of the IPSG. The line interface 106 
provides connectivity between the IPSG and one or more networked devices via 
one or more types of network connection. The network connection types may 

5 include, but are not limited to, Gigabit Ethernet, DS3/E3, POS, and ATM. 

In some embodiments of the invention, multiple IPSG modules can be 
installed in a single router hardware chassis, and can provide functionality that 
supports a variety of network connection interfaces and protocols as well as a 
scalable increase in routing capacity. 

10 Figure 2 shows a block diagram illustrating flow of a typical example 

packet in the Internet Protocol Service Generator, consistent with an 
embodiment of the present invention. At 201, a packet is received via the 
network connection line interface, and is directed by the flow manager 202 
which utilizes a steering table to determine which Virtual Local Area Network 

i 5 (VLAN) data is sent to which Virtual Routing Engine (VRE). The flow manager 

202 tags the packet with an internal control header and transfers it across the 
service generator fabric 203 to the selected VRE at 204. 

► 

Upon arrival at the VRE, the packet enters a virtual services controller 
205 for packet classification. Various packet fields such as BP source and 

20 destination, UDP/TCP source and destination port numbers, IP protocol field, 

TOS field, PSec header, and SPI field information are extracted. A flow cache 
is checked to determine whether the packet is to be processed in hardware or in 
software, and the packet is routed accordingly. In this example, the packet is to 
be processed in hardware, and so is passed on to main memory 206 from which 

25 it can be accessed by Virtual Routing Processor (VRP) 207. The VRP retrieves 

the packet, identifies the packet processing actions that can be achieved in 
hardware, and performs those processes, which include such things as checksum 
adjustment, time-to-live adjustment, and other packet actions. 

The example packet is then forwarded to the Advanced Security Engine 

30 (ASE) 208, where the packet is encrypted. The ASE performs the encryption 

and prepends and IPSec tunnel header to the packet before routing the packet 
back to the VRP 207. The VRP here then forwards the packet to a second 
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Virtual Routing Engine (VRE) 209, where a virtual router routes the packet 
through network interface connection 210. 

Figure 3 shows a block diagram illustrating operation of the Internet 
Protocol Network Operating System (IPNOS) in the context of the Internet 
5 Protocol Service Generator (IPSG), consistent with an embodiment of the 

present invention. The IPNOS provides customizable subscriber-level IP 
services through Virtual Router (VR) elements. The IPNOS creates a VR as an 
object group, where the objects include application layer, network layer, 
transport layer, ata link layer, physical layer, and other objects. For example, a 

1 0 firewall may exist in a VR as an application layer object, and TCP/IP objects 

may exist as transport or network layer objects. Data link layer objects include 
VLAN or other such data link objects, and physical layer objects include ATM, 
DS3, or other physical layer objects. 

These objects comprise various data definitions and methods, and so are 

1 5 capable of invoking methods in response to events such as the arrival of a data 

packet. These objects can invoke their own methods, or other methods from 
other objects, and so can interact with each other such as to perform task sharing. 
One element of each object is the type of processing required to execute. The 
object manager can then draw from available resources to provide the 

20 appropriate processing, and can manage the various resources such as the 

engines of Figures 1 and 2 to draw from resources tailored to a specific function. 

The line interfaces and the network module 301 in Figure 3 are tailored 
to handle data link and physical link layer tasks, such as providing a virtual 
interface 302 and virtual layer 2 switch 303. The Virtual Service Engine 304 is 

25 tailored to provide specific application layer, presentation layer, session layer, 

and transport layer functions, such as an application layer firewall 305 or an anti- 
virus module 306. The Advanced Security Engine 307 provides IPSec 
encryption, decryption, and verification voa a module 308, which operates on 
network layer objects to provide security functionality. The Virtual Routing 

30 Engine 309 provides routing services 3 10, network address translation 311, 

Multi-Protocol Label Switching (MPLS) 312, and other network and transport 
layer functions. Because VR requests for a new object or resource are managed 
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by the IPNOS, the IPNOS can dynamically allocate resources to optimize 
utilization of available processing resources. 

Figure 4 illustrates the hardware architecture of a packet forwarding 
engine, consistent with an example embodiment of the present invention- The 

5 packet forwarding engine performs hardware-assisted packet forwarding for a 

variety of network and transport layer packets, and includes functions such as 
flow cache route lookup forwarding and IP/MPLS forwarding of packets as well 
as packet header processing functions. The packet forwarding engine of Figure 4 
is partitioned into ingress and egress portions, both for the switch fabric data 

. 0 interface and for the DMA memory interface. 

Packets are received at the switch fabric interface 401, and are forwarded 
to one of a plurality of ingress processors 402. The ingress processors are 
specially microcoded for ingress processing functionality, just as egress 
processors 403 are specially microcoded for egress processing. In one 

1 5 embodiment of the invention, each ingress processor 402 operates on one 

incoming packet and each egress processor 403 operates on one outgoing packet, 
and hardware interlocks maintain packet order. 

The packet forwarding engine ingress processors pass the packet 
forwarding state parameters to the DMA engine or DMA interface ingress 404 

20 that incorporates these state parameters into the packet receive descriptor This 

forwarding state indicates whether the processor should software forward the 
packet or whether the packet can bypass software processing and can be 
hardware processed. The forwarding state also includes an index into a 
forwarding transform cache that describes packet forwarding engine processing 

25 applied to each type of received packet 

For software forwarded packets, the receive descriptor for the packet is 
pushed into a DMA ingress descriptor queue such as in memory 405. Then, the 
software processing is performed in processor 407, and the result of processing 
the packet receive descriptor is routed to the DMA interface egress 406 as a 

3 0 packet transmit descriptor. For hardware forwarded packets, the receive 

descriptor bypasses the ingress descriptor queue and is pushed directly onto a 
DMA egress descriptor queue associated with the DMA interface egress module 
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406 as a packet transmit descriptor via a hardware forwarding engine. 

Figure 5 illustrates in greater detail the forwarding data structures stored 
in system memory at 501, and illustrates the packet forwarding ingress and 
egress processing method at 502. The data structure elements and ingress and 
egress processing are described in greater detail in a specific embodiment of the 
present invention described later in this document in greater detail 

While the hardware forwarding engine on the IP Service Generator 
provides the fundamental packet forwarding functions and capability, IPNOS, or 
any other network operating system, needs to be able to take advantage of this 
capability to relieve itself of the burden of providing the basic forwarding and 
other IP services. The Packet Forwarding Engine Driver (PFED) API provides 
IPNOS a flexible interface to the PFE hardware. 

The Hardware Forwarding Engine can operate either in Prefix mode or 
Flow mode. In prefix mode the forwarding is based on some number of bits of 
the destination IP address of packet itself; no other IP services such as filtering 
are available. In Flow mode, the forwarding is still based on the destination 
address but, for the purpose of providing IP services, packets are classified into 
"flows," a flow being characterized by many parameters associated with it. 

The PFE driver, as well as the IP stack, treats the first packet of each new 
flow in a very special way, and that is because it is used to gather information 
about any packet filters, NAT rules, QoS, Metering and IP forwarding functions 
that the user has chosen for this flow. There are three major elements to the 
process of pushing flows into hardware by the PFE driver: 

(a) New flow identification 

(b) Learning 

(c) Flow setup 

Additionally, PFE also supports an API to accomplish: 

(d) CPU Forwarding Bandwidth Allocation 

(e) PFE Forwarding Bandwidth Allocation 
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These two features of the PFE/PFED are powerful tools that allow 
creating virtual routers whose software or hardware forwarding bandwidth 
allocation remains unaffected by other virtualized routers in IPNOS. 

New flow identification and flow setup for a new flow are transparent to 
5 IPNOS except that the software, given a learning packet, must either send the 

packet out or decide to terminate the flow. Both these cases are supported by the 
APL 

Learning is accomplished as the packet traverses the software IP 
forwarding stack using the PFED API functions. The information collected is 
1 0 held in a buffer referred to as an 'annotation buffer 1 which is allocated and 

attached to all learning packets before being passed to the software stack. Flow 
setup is automatically handled by PFE driver when a learning packet is being 
forwarded. 

Even though packet forwarding is handled by the PFE, the user may wish 
15 to have policies that specifiy that some flows be handled in software. In order to 

ensure that one virtualized IPNOS router is not starved by another more active 
router, the user may specify a certain CPU resource level per router at the time of 
its creation. 

The PFE driver provides an interface to allow the user to allocate the 
20 PFE's forwarding capacity much like the CPU bandwidth allocation to ensure 

that one active router doesnt consume all the hardware PFE resources. 
The API itself is broadly broken down into the following areas: 

1 . Statistics - The basic mechanism available to the OS to collect 
statistics is via a Statistics Control Block (SCB) allocated by the PFE driver, and 

25 then associating this SCB(s) to Ingress or Egress side. 

2. Filter functions - Once it is determined that packets belonging to a 
particular flow qualify for discard action, the user can tag them such that they are 
discarded in the PFE itself. 

3. QoS functions - The purpose of this function is to allow software to 

30 map IP QoS to different traffic classes. 

4. Metering Functions - Metering functions allow the OS to apply QoS 
at the traffic level such that traffic for a given flow doesnt exceed the 
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provisioned traffic parameters. As with statistics, one needs to create a Metering 
Control Block and associate an MCB to a flow such that the PFE can support 
metering. 

5. NAT/IP/MPLS forwarding - This set of functions allows the PFE 
driver to capture basic IP forwarding functions and NAT specific parameters. 

6. Software Forwarding - Packets belonging to some flows may need to 
be always handled in software as determined by criteria set by the user. This is 
accomplished by specifically using the function to tag packets as 
software-forwarded. 

7. IPSEC flows - Packets that need to be encrypted/decrypted need to be 
processed appropriately to allow the PFE driver to collect IPSEC-specific 
parameters that are necessary to encrypt/decrypt the packets. 

8 . Stateful Packet Filter (SPF) flows - The Stateful Packet Filter feature 
allows applications to allow sessions based on policies configured by the user, 
and this means that user can take advantage of the PFE hardware to create events 
based on TCP flags to let software see only packets with specified flags. This 
requires that software first of all tag them as SPF-aware flows. 

9. Diver Initialization - Since the driver can operate in three different 
modes - pass-thru, flow, and prefix mode, the driver exposes a function to allow 
user to initialize the driver appropriately. 

10. Receive - A packet received by the PFE must be first passed to PFE 
driver for handling. The driver will call a function to send packets that need to 
be handled outside of the driver. 

1 1 . Transmit - A packet that needs to be forwarded needs to be sent to 
the driver for learning termination and forwarding by calling a driver function, 
and the driver, in turn, will call a function to send the packet out 

12. PFE/IPNOS Forwarding Bandwidth allocation - The processors and 
the hardware forwarding engine collectively are single resources that are shared 
among all the virtual routers. This API provides the mechanism to distribute 
these resources to ensure fairness. 

In one specific embodiment of the present invention described in the 
remainder of this specification in greater detail, the PFE maintains a table of 

10 
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Transform Control Blocks (TCBs), which direct how the egress controller 
processes outgoing packets. The egress controller uses the 20-bit forwarding 
index, carried by the DMA descriptor, to select a transform control block from 
the table before processing packets. Each transform control block entry contains 
64-bytes formatted as described in the table below. 



Word ; Bits •" Name Description' ■ 

0 31:28 PktCmd Packet forwarding command: 

0: Discard packet. 

1: Forward packet. 

2: Return packet to CPU. 

3-15: Reserved 

27:20 Reserved 

19*16 PktDst Forwarding destination for the packet: 

0: Processor Engine 

1: Security Engine 

2: Line Interface 

3: PPPoE Interface 

4: Tunnel Interface 

6-15: Reserved 
15:0 PktMTU Packet MTU. 
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Word Bits 



1 



2 
3 



31 
30 



29 
28 
27 
26 
25 
24 

23:16 
15:8 

7:4 



1:0 

31:0 

31:0 



Name 

NATJP 

DropCpuPkt 

NATTCP 

ReplaceRM 

ReplacelD 

ValidCRC 

DecrTTL 

ReplacePRI 

TOS/EXP 

TOS/EXP 

Enables 
MPLS 



PWE3 

Enable . 
PWE3 " 

Control 
Reseated 
StatsOutPtrO, 
StatsOutPtrl" 



31:16 HdrOfiset 



Description 

Perform NAT on IP addresses. 

If this bit is set and the Pkt desc is HW_COH the packet is 
dropped 

Perform NAT on TCP/UDP port addresses. 

Replace Rate-Marking field in SF header. 

Replace IP header ID field with incremented PktlD. 

Validate IP header checksum. 

Decrement the IP or MPLS header TTL value. 

Replace Priority field in SF header. 

IP TOS/MPLS EXP. replacement value 

Enables for IP TOS/MPLS EXP replacement. (Set high to 

replace bit) 

MPLS Operation Code 



Operation 0:NOP 



1:PUSH 
2: POP_PEEK 
3: POPJFWD 
4: SWAP 

5: POP jL2VPN_NULL . 

6: POP^L2VPN_CTRL 

PWEi&special case handling of 12 packets. 

PWE3 control word should be added. Used when CW is 

IT "* r m * . w 

"optioiiil". 



Memoiy pointer to egress statistics block 0. 
Memory pbinter to egress statistics block 1 (Always 
assumed enabled). 

Indicates the number of bytes before the start of payload 
when an application specific header is located. Used for 
PPPoE. Also used for detunneling, indicates the number of 
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Word 



5 
6 
7 
8 



Bits Name 

15:0 HdrLen 

31:0 HdrPtr 

31:0 NATJPSrc 

31:0 NATJPDst 

31:16 NAT.TCPSr 



Description 

Byte length of the transform header. 
Memory pointer to the transform header data. 
IP source address NAT replacement value. 
IP destination address NAT replacement value. 
TCP/UDP source port NAT replacement value. 



9 
10 



11 



12 



13 



14 



15 



15:0 NAT.TCPDs TCP/UDP destination port NAT replacement value, 
t 

31:0 PktldPtr Memory pointer to packet ID value. 

31:0 MeterOutPtr Memory pointer to egress metering control block 0. 

0 

31 :0 MeterOutPtr Memory pointer to egress metering control block 1 . 

i • • 

31:8 Reserved 

7:0 EgressQosIn Mode and memory pointer to the egress QOS translation 

dex table 

31:0 L3 Header Memory pointer to the L3 encapsulation header 

Ptr 

31:0 L3 Header Size of the L3 encapsulation header 



Size 
31:16 FCBTag 



15:0 TCPChkAdj 



The value of the corresponding FCB pending tag must be 

written hire to associate the TCB with the flow. A value of 

0 needs lo :be written in prefix mode. 

TCP Checksum adjustment for TCP transforms. 



Table 10 Transform Control Block 



To update a Transform Control Block (TCB), host software sends a 
control packet containing a PFE_EGRESS_WR message with an address 
5 parameter that points to the new TCB. Software should issue the TCB update 

control packet before issuing the packet being forwarded. This ensures that the 
forwarded packet is processed according to the updated TCB. 



13 



WO 03/103237 PCT/US03/17674 

There are a couple fields used to maintain packet order and associate the 
TCB with a specific flow. In flow mode where several NEW packets for a flow 
could be sent to the CPU there is a danger that once the CPU updates the TCB 
and FCB a packet could be hardware forwarded while the CPU still has packets 
for that flow. Packet order use to be maintained by a conflict cache in the DMA 
engine, but now it is enforced by the TCB. When the TCB is written the 
DropCpuPkt bit should be zero, this will allow the CPU to send the NEW 
packets it has for that flow. However when the first FWD_HW packet is seen 
with this bit clear, the forward engine will update the TCB and set this bit 
Subsequent packets from the CPU (recognized because they are marked 
FWD _HW__COH) will be dropped. 

There is also a consistency check performed between the FCB and the 
TCB. On ingress the SF header SrcChan is replaced with the PendingTag field of 
the FCB, on egress the SrcChan is compared against the FCBTag field of the 
TCB. If the tags mismatch the packet is dropped. For prefix mode the SrcChan is 
replaced with zero, and the FCBTag field must be initialized to zero. 

In its simplest form, the packet header transformation involves the 
replacement of some number of header bytes of an ingress packet with some 
number of bytes of replacement header data. Under the control of a Transform 
Control Block, the PFE egress unit can selectively replace and recompute 
specific fields in a small set of protocol headers. 

The PFE egress unit begins the header transform by stripping the 
incoming packet's SF header along with the number of bytes indicated by the SF 
header offset field. At that point, the controller will begin copying bytes from 
the buffer pointed to by the TCB's HDRPTR field into the egress packet buffer. 
The PFE will copy the number of new header bytes defined by the TCB's 
HDRLEN field. 

After performing this header replacement, the PFE then goes through the 
TCB enable bits to determine what other header transformations need to be 
made. The sections below explain some of these transformations. 
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The PFE uses the TCB HDRLEN field to update the SF header length 
field for outgoing packets. By default, the PFE retains the SF header RM (rate 
marking) and PRJ (priority) fields from the incoming packet in the outgoing 
packet. When the associated TCB's ReplaceQOS field is set, the PFE replaces 

5 the incoming RM and PRI fields with the values set in the TCB's header block. 

The PFE also replaces the RM field for outgoing packets when rate marking is 
enabled in the TCB. In cases where the hardware detects an exception that 
requires software processing, the PFE returns packet to the CPU and sets the SF 
header error code to 0x7. 

I o The PFE egress controller supports independent replacement of the IP 

source and destination addresses to support IP NAT. It also supports 
replacement of the IP Type-of-Service (TOS) field. When enabled, the PFE 
egress controller will decrement the IP Time-To-Live Field and can conditionally 
replace the IP identification field based on the Transform Control Block's 

1 5 Replace© field. For a particular flow with the TCB ReplacelD field enabled, 

the PFE fetches the ID from the memory location pointed to by the TCB's 
PktldPtr field. PFE increments the stored ID value after it replaces a packet's ID 
field. 

For each IP header field transform, the PFE computes and applies an adjustment 
20 to the IP header checksum field. With a separate bit in the TCB, host software 

can request that the PFE validate the ingress IP header checksum field. 
If the TCB PktDst field indicates that the packet is destined to the 

Security Engine, then the PFE egress controller replaces the security engine 

header Fragment Size field. If the TCB ReplacelD field is also set, the PFE 
25 performs packet ID replacement in the security engine header instead of the 

egress packet IP header. 
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If the TCB PktDst field indicates that the packet includes a PPPoE 
header, then the PFE egress unit must update the PPPoE payload length field 
before transmitting the packet. Software indicates the location of the PPPoE 
header by setting the TCB HdrOffset field to the number of bytes between the 
start of the PPPoE Header and the start of the L3 packet payload. The PFE 
egress unit will then update the last 2 bytes of the 6-byte PPPoE header with the 
packet's payload length. It computes the PPPoE payload using the following 
formula: 

PPPoE Payload Length = L3 Payload Length + TCB HdrOffset Value- 
PPPoE header length (6 bytes). 

In the event that the hardware detects an exceptional packet that requires 
software processing, the PFE controllers will return the packet to the CPU with 
the packet's SF Header Error field set to 0x6 and set the SF SrcChld to an error 
code. The Switch Fabric Document lists the possible error codes to get placed in 
the SF SrcChld. 

The PFE egress unit independently rate limits ingress and egress packets, 
if enabled. As part of rate limiting, the PFE meters, marks and drops packets. 
The PFE performs ingress rate limiting before header transformation and 
performs egress rate limiting after header transformation. Software controls 
metering and rate marking using a combination of Metering Control Blocks 
(MCBs) and fields in the TCB and ingress Statistics Blocks. 

The PFE implements both ingress and egress rate metering and marking 
according to the two-rate three color marker (trTCM) definition in RFC 2698. 
Per this definition, in color-blind mode the PFE marks the drop precedence color 
of a packet as Green if it does not exceed the CBS, Yellow if it exceeds the CBS 
but not the PBS, and Red if it exceeds both CBS and PBS. The packet's color is 
encoded into the rm field of the LQ header. The PFE increments the C and P 
buckets by the CIR and PIR values, respectively, in 1ms intervals. 

The PFE egress unit may optionally drop Yellow or Red packets or may 
color packets for a downstream dropper. The RatelnCtl zsARateOutCtl fields of 
the TCB control whether and how to drop packets on ingress and egress rate 
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limiting. 

A set of Metering Control Blocks (MCBs) maintained in system memory 
contain per flow (VR, VI, or ACL) trTCM parameters. Table 1 1 defines the 
MCB data structure. Hardware provides three logical metering units: Vl-based 
ingress metering, flow-based ingress metering, and flow-based egress metering. 
The TCB contains two MCB pointers for flow-based metering. The Vl-based 
MCB pointer is contained in the Vl-based stats block and will be discussed in 
more detail below. 



,. " ; ■ ■• i L. tt-'. K • -l--tw.il..,, - - ■ -• - . ).•■ t *■> '■ -■ i - 



0 


31:0 


Greenjjytes 


Bottom 32 bits of green-metered bytes count. 






(lower) 




1 


31:0 


Ctokens 


Number of bytes in C token bucket 


2 


31:0 


Ptokens 


Number of bytes in P token bucket 


3 


31:0 


Metered j>kts 


Bottom 32 bits of metered packet count. 






(lower) 




4 


31:0. 


Yellow_bytes 


Bottom 32 bits of yellow-metered bytes count 






(lower) 




5 


31:0 


Redjbytes (lower) 


Bottom 32 bits of red^metered bytes count. 


6 


31:0 


Timeslot 


1ms timeslot value. 


7 


31:0 


Reserved 




8 


31:0 


CIR 


Committed information rate in bytes/timeslot 


9 


. . 31:0, 


PIR 


Peak information rate in bytes/timeslot, 


10 


31:0 


CBS 


Committed burst size in bytes. 


11 . 


31:0 


PBS 


Peak burst size in bytes. 


12 


63:3 


Metered_pkts 


Upper 32 bits of metered packet count 




2 


(upper) 


■ i * * * 


13 


63:3 


Green_bytes 


Upper 32 bits; of green-metered byte count 




2 


(upper) 




14 


63:3 


Yellow_bytes 


Upper 32 bits of yellow-metered byte count 




2 


(upper) 




15 


63:3 
2 


Red_bytes (upper) 


Upper 32 bits of red-metered byte count. 
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Table 11 Metering Control Block 

Software controls where and how the hardware accesses MCBs by setting 
up arrangements of MCB pointers. The MCB pointer data structure contains a 
32-Byte aligned memory pointer along with mode control bits as detailed in the 
table below. In it's simplest form, the pointer field indicates the memory 
location of a single MCB. In its most complex mode, the pointer indicates the 
location of an ordered array of up to 8 MCB pointers. When the hardware loads 
an MCB pointer array, it performs metering and rate marking starting with the 
first MCB pointer and continuing as directed by the Next Pointer field in the 
MCB pointer. Software can disable rate marking completely by setting all 4 
bytes of the MCB pointer 0. (Note: MCB arrays are not implemented yet) The 
lowest 5 bits should be masked out before using this 4-byte word as the memory 
pointer. 



-3-*- ^- '.r 



31:5 Memory 
Pointer 



4:3 Metering 
Mode 



This field contains a memory pointer to an MCB, an MCB pointer 
array, or a Rate Marking Translation Table. The Metering Mode 
field determines whichmode to use. This pointer must be 32-byte 
aligned. 

This fields determines to what structure the Memory Pointer field 
points: 

0: MCB - Color Blind 

* ■ * * i , ■ ' * . ■ 

1: MCB - Color AWare \ 
2: MCB Array 
3: Reserved 

2:1 . Drop Policy This field indicates the traffic policing policy: 

0: No dropping 
1 : Drop on red marking only 



2: Drop on yellow or red marking 
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3: Reserved 

0 Next Pointer This field indicates whether the hardware should continue to the 

next MCB pointer in an array: 

0: Stop after the current pointer 

I : Continue to the next MCB pointer in the array. 

Table 12 MCB Pointer Format 

As a special optimization, software embeds the MCB pointer for the VI- 
based ingress metering in a reserved field of the Vl-based ingress stats block. 
Software must guarantee that this reserved field of the stats block is always 
5 initialized to 0 in the case where metering is not enabled. 

The Vl-based statistics block also contains two MCB pointers for 
metering traffic bound for software. One pointer is for best effort traffic and the 
other is for control traffic. Software must initialize these pointers to 0 if metering 
is not enabled. 

1 0 When IP/MPLS packets airive at the ingress, the PFE uses the QOS 

pointer in the Vl-based ingress stats block. This pointer indicates how the 
hardware translates the incoming TOS/EXP field into the LQ header's PRI and 
RM fields. If the pointer is NULL then the translation is skipped. 

Similarly, as a final step before transmitting an IP/MPLS packet, the 
1 5 hardware takes the updated LQ header PRI and RM fields an reverse translates 

these back to the packet's TOS/EXP field. Again, if the QOS pointer is NULL 
then the translation is skipped. 

The ingress QOS translation pointer resides in the last 4 bytes of the Vl- 
based ingress stats block. For IP packets the ingress table consists of 256 entries, 
20 indexed by the incoming packet's IP header TOS field. For MPLS packets the 

ingress table consists of 8 entries, indexed by the incoming packet's MPLS EXP 
field. Each entry is 8 bytes wide (4B mask, 4B value). The ingress table entry 
format is described below: 
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Word Bit Name Description ".'V:.. 
Field 

0 31:25 Reserved Should be zero 

24:23 RM Mask Rate Marking Mask. Only bits to be replaced are high. 

22:20 PRI Mask Priority Maski Only bits to be replaced should be high. 

19:0 Reserved Should he zero.. * 

1 31:25 Reserved V^Shouldbe zerb :; : 
.. 24:23 RM Value Rate Marking value ' 

; 22:20 PRI Value ;^ew Morityrvalue- • 

19:0; . Reserved JShould be zen>. 



Table 13 Ingress QOS Translation Table Entry Format for IP and MPLS 

The egress QOS translation pointer resides in word 12 of the associated 
TCB. The egress table consists of 32 entries indexed by the concatenation of the 
outgoing packet's {RM, PRI} SF header fields (the RM bits reside in the MSB 
of the table index). Each entry is 8 bytes wide (4B mask, 4B value). The egress 
table entry formats for IP and MPLS packets is described below.: 

Word Bit Name,vJ.;..^: cjOescrip:^^ V r -"* : ^i,.., .:>^- v r 



i :0^'V ^•ai^^^S^ed^ i Stioxifitbe zeri; J; 



23;lr6 * TOS Masks; i; ; TQS Mask^O^^ts to^^^ishould be high. ; 



i : ; - e 15;0^ v^Riserved^ 1 -' -/Should be zSSi**- 
1^" ■ : .'■ 31:24' Re^^^^hiyiiibb 



WZ : &i£- -'^Wri -r^^f^c- r-:. ; J ""■ '' : ; ' 



15:0;^ ^^erved^ . • ^ ^c^M^ v;^ : 

Table 14 Egress QOS Table Entry Format for IP 
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\ Word Bit Name Description 

i Field 

0 31:12 Reserved Should be zero. 

11:9 EXP Mask EXP Mask; • Only bits to be replaced should be high. 

8:0 Reserved Should be zero. 

l l 31:12 Reserved Should be zero. 

11:9 EXP Value New EXP value 

' 8:0 ^Reserved Should be zero. J'' : JA'.L '„..ilL.:U. . ' 

Table 15 Egress QOS Table Entry Format for MPLS 

The PFE hardware maintains packet statistics for all packets in Statistics 
Block data structures. The PFE updates both statsOutPtrO and statsOutPtrl egress 
packet statistics after header transformation. Along with the TCB stats block 
5 pointers for egress StatsOutPtrO and statsOutPtrl flow statistics, the PFE also 
maintains per- VI ingress statistics using per-protocol tables indexed by LQID. 

Each statistics block contains three sets of counters, one set for normal 
packets and bytes, another for dropped packets and bytes and a third for packets 
with errors. The stats block also contains a field for counting the number of 
1 0 packets sent out as a result of packet fragmentation. There is a reserved field at 
the bottom of the stats block that is used for indicating ingress- VI metering control 
information. It should be initialized to 0 when the stats block is allocated. 



i 



-i! ■' -7 ^ • v.-' .-..'i.-T'" . 



0:1 

2:3 

4:5 

.6:7 

8:9 

10:11 

12 

13 
14 



63:0 Trans pkts 

63:0. Trans J>ytes 

63:0 Dropped jpkts 

63:0 Drcyped^bytes 

63:0 Error_pkts 

63:0 Errorbytes 

31:0 MeterSwBEPtr 



Number of packets , transmitted. 

Number of bytes transmitted. 

Number of packets dropped. 

Number df bytes dropped. . ; . 

Number of packets with errors . 

Number of bytes.with errors. . 

Pointer to meter block for software bound best effort 



traffic 

31:0 MeterSwCtlPtr Pointer to meter block for software bound control traffic 
31:0 LQID Pointer to fagress VI rate-limiting control blocL 
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15 



Metering Ptr 

31:8 FlowCapIndex 

15:10 Flag bits 

9:8 Mode 

7:0 IngressQosIad 



Software should initialize this field to 0 when allocating 
the stats block. 

Index into table of Flow cap structures. 
Mode dependent 

0 - Normal, 1 - L2 VPN, 2:3- Reserved. 

Index into an array to TOS to RM/PRI translation tables. 

Software should initialize this field to 0 (disabled) when 

allocating the stats block. 



Table 16 Ingress LQID Statistics Block 



Word. 


vjBits i 


0:1 


63:0 


2:3 


63:0 


4:5 


63:0 


6:7 


63:0 


8:9 


63:0 


lfcj-l 


63:0 


12:13 


63:0 


14:15 


63:0 



^.Bits ^Staie *v?;v- f ^ - . ^Bescnptioiift ^ ,= v > \r -:- . ^^^v-v.^s: 
Ttansjpkts Number of packets transmitted. 
Trans_,bytes Number of bytes transmitted. 

» - 

Dh>pped_pkts '.. Number of packets dropped. 

Dropped_bytes Number of bytes dropiped. 

vError_pkts Number of packets with errors . 

Error_bytes Number of bytes with errors . 

Frag pkts Number of fragment packets transmitted 

63:0 FragJ>Ytes Number of fragment bytes transmitted.. 

Table 17 Egress Flow Statistics Bytes 

The stats block pointer is bimodal in that it can points to single stats block 
or in the future to an array of stats block pointers. In array mode, the host software 
can associate up to 8 stats blocks with each of the TCB stats pointer fields. The 
PFE will traverse the table of pointers starting at the first entry and continuing as 
directed by the Next Pointer field. Software disables a table entry by setting all 4- 
bytes of the stats block pointer to 0. StatsOutPtrl of the TCB is always assumed to 
be enabled to save instructions. If the either StatsOutPtrO or StatsOutPtr is setup to 
point to something other than a stats block, then there can be dangerous memory 
corruption of that block and eventually other memory blocks. 
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Bit Field Name 
31:5 Pointer 



Description 

PFE memory address to the associated stats block. The stats block 
assumed to be 64-byte aligned. 



4:2 



Reserved 



1 



Mode 



Pointer 



Defines whether the pointer field points to a stats block or to an 
array of stats block pointers : 



0: Stats Block 



0 



Next 



1 : Stats Block Pointer Array 

This field indicates whether the hardware should continue to the 



Pointer next stats block pointer in an array: 
0: Stop after the current pointer. 
1 : Continue to the next stats block pointer. 

Table 1 8 Statistics Block Pointer Format 

In both prefix-mode and flow-mode, the PFE hardware maintains per- VI 
ingress statistics in a set of tables of stats blocks indexed by the packets LQID and 
LQ protocol. The hardware selects a table using the packet's LQ protocol field and 
then selects the table entry using the LQID as an index. Per- VI ingress statistics are 
maintained for every packet. 

The PFE hardware supports Network Address Translation for IP addresses 
and for TCP/UDP port addresses. When software enables IP or TCP/UDP NAT, it 
must also provide the associated replacement addresses and checksum adjustments 
in the corresponding TCB fields. When the hardware detects one of the NAT 
enable bits is set to * 1\ it will always replace both the source and destination 
addresses. If software intends to translate only the source address, it must still 
supply the correct destination address in the TCB replacement field. Similarly, the 
software must also supply the correct source address in the TCB replacement field 
when it is just replacing the destination address. 
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The checksum adjustment should be computed as follows: 

ChkAdj = aNew + -aOld + bNew + -bOld + cNew + -cOld 

where the + is a one's complement addition (meaning any carry bits are looped 
back and added to the LSB) and ~ is the inversion of all. 

On the ingress side all layer 2 packets are distinguished by bit 5 of the SF 
header protocol field being set. The PFE micro-code checks this bit and jumps to 
separate L2 header loading logic when it is set. Separate code-points for each 
L2/L3 protocol are defined in the SF spec, jumping to the proper parsing logic is 
done by using the entire SF protocol (including the L2 bit) field as an index into a 
jump table and jumping to that instruction which causes a jump to the proper code 
segment One of the functions of the 12 parsing logic is to determine the size of the 
variable length L2 headers and increment the SF offset field by that amount (in 
some cases, such as de-tunneling 2 nd pass) so that the PFE egress will strip off that 
part of the header. In addition the SF protocol field may be changed (also 2 nd pass 
de-tunneling) to another protocol type depending what the underlying packet type 
is, this is also determined by the parsing logic and causes the proper egress code 
path to be taken. 

Tunneling is the trivial case for 12 packet transformation. On the ingress 
side a PPP packet arrives (LAC case), is parsed to get the protocol field for the 
hash, and the flow hash performed to determine the flow index. No SF header 
offset or protocol modification is done in this case. The actual tunneling is 
performed via the TCB on the egress side. 

On the egress side a new header is appended to the packet via normal TCB 
processing, in this case the header would include IP/UDP/L2TP headers. Then all 
IP/MPLS specific transform logic is skipped and statistics, metering, etc is 
performed. The only new processing on die egress side is to update the ID field of 
the newly added IP header, and re-compute the BP checksum. To support this a new 
PktDst code-point 'Tunnel Interface" has been added. When the micro-code 
detects this code-point, the IP header (assumed to be just after the SF header) ID 
field is modified in a similar fashion as for "Security Engine" destined packets. The 
PlctldPtr field in the TCB is used to point to the current packet ID, the ID is read 
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from memory, used to modify the IP header, incremented, and written back to 
memory. In this way all that S W needs to do to set-up for a tunnel is to set the TCB 
up with the properly formatted header block, ID header pointer, initialized ID 
value, and set the PktDst field to Tunnel. 

5 For the LNS case an IP packet is received on the ingress side and goes 

through normal IP header parsing logic and egress IP processing. The only 
difference is that the added TCB header must contain EP/UDP/L2TP/PPP in its 
contents. Everything else is as described above for the LAC case. 

The De-Tunneling case is much tougher and involves two pass processing 
10 as well as two stage flow learning. In this case the incoming packet consists of P- 
UDP-L2TP-PPP-ff-XXX-payload. The first pass is just like any normal IP packet, 
on the ingress the IP header is parsed and the flow hash is performed to determine a 
flow index. On the egress side normal IP TCB processing will be performed. 
Software must set-up the new header block with a new SF header such that the 
1 5 packet comes back (via the SF destination fields) to the ingress side of the PFE 
again for the second pass. In addition this new SF header must contain one of the 
newly defined L2 protocol code-points L2TP_LAC (41 including L2 bit), or 
L2TP JLNS (42 including L2 bit), and the SF offset field should be set with the 
proper offset to cause the 2 nd pass processing to look at the L2TP header. 

20 On the second pass the SF offset field now points us to the L2TP-PPP-IP-XXX- 

payload part of the packet Depending on the L2 protocol code-point the L2 parsing 
logic will go to different depths into the packet to gather the hash words for the 
flow hash. In the L2TP_LAC case only the L2TP header is parsed, in the 
L2TP JLNS case the parsing goes into the encapsulated IP and even TCP/UDP 

25 headers if present. This parsing again tells the egress logic how many bytes to 

adjust the SF offset field and which protocol to change the SF protocol field. For 
the LAC case the protocol field will be changed to PPP and the offset field adjusted 
to point at the PPP header, for LNS it is changed to IP and the offset adjusted to 
point at the IP header. Changing of the protocol and offset fields in this manner 

3 0 causes the egress side to process what is left of the packet in the proper manner. 

The LAC case results in a PPP packet being sent to the egress logic, in this case all 
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IP/MPLS specific logic is skipped and only stats/metering micro-code is executed. 
In the LNS case an IP packet is presented to the egress and processed the same way 
as any other IP packet 

Tunneling packets via GRE is performed in exactly the same manner as for 
5 L2TP. IP/MPLS or some other packet type is received on the ingress side and 
processed normally by the ingress micro-code. On the egress side normal TCB 
header processing adds a SF/IP/GRE header to the packet. The PktDst 'Tunnel" is 
detected which tells the egress micro-code to modify the outer IP header of the 
outgoing packet (in the same manner as described for L2TP) and the tunneling is 
10 complete. 

De-Tunneling of GRE packets is done in the same two pass manner as 
L2TP with only the 2 nd pass parsing logic being different On the ingress side an IP 
packet is received (with protocol = 47) and processed normally. On the egress side 
the TCB adds a new SF header containing the L2 protocol type GRE, and a SF Dst 
15 field which causes the packet to be switched back to the ingress for a 2 nd pass. 
Again, this SF header should also contain an offset field that points 2 nd pass 
processing to the embedded GRE header. 

On the 2 nd pass the GRE parsing logic is executed (via the SF protocol jump 
table) to gather the required fields for the flow hash and to determine the size of the 
20 L2 header and what underlying protocol is being tunneled. Which fields are used in 
the flow hash is determined by the parsing logic depending on what is being 
tunneled. The SF offset field is incremented to point at the tunneled packet and for 
IP or MPLS the SF protocol field is changed to those corresponding code-points. 

On the 2 nd pass egress side the underlying tunneled packet is processed 
25 depending on the SF protocol field. If an IP or MPLS packet was tunneled then 
they are processed as any IP or MPLS packet would. If some other protocol was 
tunneled then the protocol field was not changed by the 2 nd pass ingress micro-code 
and the code-point still is L2 GRE. This packet is processed as any L2 packet, 
skipping all IP/MPLS specific transforms and jumping straight to stats/metering. In 
30 this case the underlying tunneled packet is just forwarded as is, without any special 
processing. 
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PWE3 tunneling is a special case, which is done on an LQ basis. In this 
case the ingress packet will be received with some L2 protocol (Ethernet, VLAN, 
PPP, AAL5, Frame Relay) but will not be processed as such, instead a per LQ 
enable will be provided in the statistics block which will tell the micro-code that 
5 special handling is required. For now this feature is enabled when the 2 bit mode 
field in the stats block is set to 1 (L2 VPN mode). When this is detected by the 
ingress micro-code it causes the flow hash to be computed using only the LQID. 
No other "special" processing is done by the ingress micro-code. 

On the egress side 2 bits will be provided in the TCB, one for PWE3 

1 0 enable, and one for control word tag enable. The egress micro-code will check the 
PWE3 enable bit for all L2 packets and if it is enabled will perform special PWE3 
handling. This includes stripping of any required headers from the L2 packet and 
tagging on of the control word between the SF/Tunnel/VC header added by the 
TCB and the remainder of the 12 packet. When the egress micro-code detects an 

15 L2 packet with the PWE3 enable bit set in the TCB it looks at the SF protocol field 
to determine a further course of action. For AAL5 and Frame Relay the control 
word is required and some of the L2 packet must be discarded, hi these cases the 
micro-code will load the proper amount of header (2 byte frame header for frame 
relay, and 16 byte Maker header for AAL5) to construct the control word. After 

20 creating the control word the header is discarded by subtracting the correct amount 
from the packet length. The control word is then added to the packet at the proper 
location on transmit by putting it in the new L3 header. For all other protocols the 
control word is optional, so the control word enable bit is checked and if set a 
"dummy" control word will be add in the same manner as before. 

25 De-Tunneling of PWE3 packets is performed on the egress side with the 

addition of a couple of new MPLS operation code-points (POP_L2VPN_NULL 
and POP_L2VPN_CTRL). On the ingress side an MPLS packet is received and 
hashed normally to determine the flow index. The MPLS tag is actually a "Martini" 
VC header and must be popped in a special way by the egress micro-code. When 

30 one of these two MPLS operations is encountered the micro-code will look at the 
new SF header (added via the TCB) protocol field to determine what to do next. If 
the protocol is AAL5 or Frame Relay then a control word is present and must be 
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m 

pulled off and used to modify the L2 header template following the SF header in 
the TCB header block. Any other protocol field such as VLAN, Ethernet, or PPP 
for example will cause the MPLS operation to be looked at again. If the operation 
is POP_L2VPN_NULL then de-tunneling is complete, if it is POP_L2VPN_CTRL 
then the "dummy" optional control word must be pulled off and discarded before 
de-tunneling is complete. 

The embodiment of the invention described above is but one example 
embodiment of the present invention. Although specific embodiments have been 
illustrated and described herein, it will be appreciated by those of ordinary skill in 
the art that any arrangement which is calculated to achieve the same purpose may 
be substituted for the specific embodiments shown. This application is intended to 
cover any adaptations or variations of the invention. It is intended that this 
invention be limited only by the claims, and the fall scope of equivalents thereof. 
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Claims 

1 . A method of applying one or more functions to network data packets in a virtual 
router, comprising: 

receiving a packet comprising a part of a packet flow; 

5 evaluating a packet within the packet flow to determine which of the one or 

more functions are to be applied to the flow; 

storing the results of the evaluation in a record; and 

applying the functions indicated in the stored record to subsequent packets 
in the packet flow. 

10 

2. The method of claim 1, wherein the one or more functions includes at least one 
of encryption, network address translation, packet filtering, metering, quality of 
service determination, and IP forwarding. 

15 3 . The method of claim 1 , wherein evaluating a packet within the packet flow 
comprises evaluating the first received packet of a flow. 

4. The method of claim 1 , wherein storing the results of the evaluation in a record 
comprises storing the record in cache memory. 

20 

5. The method of claim 1, wherein evaluating a packet within the packet flow to 
determine which of the one or more functions are to be applied to the flow 
comprises tracking a packet as functions are applied to various network layers; and 

wherein storing the results of evaluation comprises storing a record of the 
25 one or more functions applied to the packet tracked through various network layers* 

6. A method of recording application of one or more functions to network data 
packets in a virtual router, comprising: 

receiving a packet comprising a part of a packet flow; 
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evaluating a packet within the packet flow to determine which of the one or 
more functions are to be applied to the flow; and 

storing the results of the evaluation in a record. 

7. A method of applying one or more functions to network data packets in a virtual 
router, comprising; 

retrieving a stored record of functions to be applied to packets in a packet 
flow, wherein the stored record indicates which of the one or more functions are to 
be applied to the packet flow; and 

applying the functions indicated in the stored record to subsequent packets 
in the packet flow. 

8. A machine-readable medium with instructions stored thereon, the instructions 
when executed operable to cause application of one or more functions to network 
data packets in a virtual router by: 

receiving a packet comprising a part of a packet flow; 

evaluating a packet within the packet flow to determine which of the one or 
more functions are to be applied to the flow; 

storing the results of the evaluation in a record; and 

applying the functions indicated in the stored record to subsequent packets 
in the packet flow. 

9. The machine-readable medium of claim 8, wherein the one or more functions 
includes at least one of encryption, network address translation, packet filtering, 
metering, quality of service determination, and IP forwarding. 

10. The machine-readable medium of claim 8, wherein evaluating a packet within 
the packet flow comprises evaluating the first received packet of a flow. 
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1 1 . The machine-readable medium of claim 8, wherein storing the results of the 
evaluation in a record comprises storing the record in cache memory. 



12. The machine-readable medium of claim 8, wherein evaluating a packet within 
5 the packet flow to determine which of the one or more functions are to be applied 

to the flow comprises tracking a packet as functions are applied to various network 
layers; and 

wherein storing the results of evaluation comprises storing a record of the 
one or more functions applied to the packet tracked through various network layers. 

10 

13 . A machine-readable medium with instructions stored thereon, the instructions 
when executed operable to record application of one or more functions to network 
data packets in a virtual router by: 

receiving a packet comprising a part of a packet flow; 

1 5 evaluating a packet within the packet flow to determine which of the one or 

more functions are to be applied to the flow; and 

storing the results of the evaluation in a record. 

14. A machine-readable medium with instructions stored thereon, the instructions 
20 when executed operable to cause application of one or more functions to network 

data packets in a virtual router by: 

retrieving a stored record of functions to be applied to packets in a packet 
flow, wherein the stored record indicates which of the one or more functions are to 
be applied to the packet flow; and 

25 applying the functions indicated in the stored record to subsequent packets 

hi the packet flow. 

15. A virtual router system operable to apply one or more functions to network 
data packets in a virtual router, comprising: 

30 a network interface for receiving a packet comprising a part of a packet 
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flow; 

digital logic for evaluating a packet within the packet flow to determine 
which of the one or more functions are to be applied to the flow; 

storage for storing the results of the evaluation in a record; and 

a hardware forwarding engine for applying the functions indicated in the 
stored record to subsequent packets in the packet flow. 

1 6. The virtual router system of claim 1 5, wherein the one or more functions 
includes at least one of encryption, network address translation, packet filtering, 
metering, quality of service determination, and IP forwarding. 

17. The virtual router system of claim 15, wherein evaluating a packet within the 
packet flow comprises evaluating the first received packet of a flow. 

18. The virtual router system of claim 15, wherein the storage comprises cache 
memory. 

19. The virtual router system of claim 15, wherein digital logic for evaluating a 
packet comprises logic for tracking a packet as functions are applied to various 
network layers; and 

wherein the results of the evaluation stored in a record in storage comprise a 
record of the one or more functions applied to the packet tracked through various 
network layers. 

20. A virtual router system operable to record application of one or more functions 
to network data packets in a virtual router, comprising: 

a network interface for receiving a packet comprising a part of a packet 

flow; 

digital logic for evaluating a packet within the packet flow to determine 
which of the one or more functions are to be applied to the flow; 
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storage for storing the results of the evaluation in a record 



21. A virtual router system operable to apply one or more functions to network 
data packets in a virtual router, comprising : 

5 a storage device operable to store a record of functions to be applied to 

packets in a packet flow, wherein the stored record indicates which of (he one or 
more functions are to be applied to the packet flow; and 

a hardware forwarding engine operable to retrieve the stored record from 
storage and to apply the functions indicated in the stored record to subsequent 
0 packets in the packet flow. 
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